MG205: Econometrics Theory and Applications

Topic 6: Limitations of Linear Regression

José Ignacio González-Rojas

London School of Economics and Political Science

December 1, 2025

Today We Examine When OLS Fails

From Functional Forms to Model Limitations

Last week covered

  • Polynomial terms for non-linear relationships
  • Interaction terms when effects depend on context
  • Log transformations for elasticities
  • F-tests for multiple restrictions

Today’s focus

  • When does omitted variable bias matter?
  • Prediction vs causal inference
  • Reverse causality and simultaneous equations
  • Sample selection and non-random sampling

Understanding when OLS breaks down shapes how we interpret every regression

Exercise 1: Does a Commodities Trader Care About Omitted Variables?

A Trader Predicts Tomorrow’s Oil Price

A commodities trader builds a model to forecast tomorrow’s oil price

\[\widehat{\text{oil price}}_{t+1} = \hat{\beta}_0 + \hat{\beta}_1 \cdot \text{inventory}_{t}\]

She omits variables like geopolitical tensions, OPEC decisions, and dollar strength.

Should she worry about omitted variable bias?

The answer depends entirely on her objective

Two Models Serve Fundamentally Different Purposes

Forecasting vs Understanding Causation

Linear Projection (Prediction)

\[y = x'\beta + e\]

\[\mathbb{E}[x \cdot e] = 0\]

  • Finds best linear predictor
  • Orthogonality holds by construction
  • Goal: minimise forecast error
  • Include anything that helps predict \(y\)

Linear Regression (Causal)

\[y = x'\beta + e\]

\[\mathbb{E}[e | x] = 0\]

  • Estimates causal effects
  • Exogeneity is an assumption
  • Goal: understand how \(x\) affects \(y\)
  • Include only confounders

Orthogonality Is Weaker Than Exogeneity

The Mathematical Distinction

Projection guarantees unconditional orthogonality

\[\mathbb{E}[x \cdot e] = 0\]

  • Errors uncorrelated with \(x\) on average
  • Automatic from minimising squared prediction error

Causal regression requires conditional mean independence

\[\mathbb{E}[e | x] = 0\]

  • Error has zero mean at every value of \(x\)
  • This is an assumption about the world

Exogeneity \(\implies\) Orthogonality, but not the reverse

The Trader Cares About Accuracy, Not Causation

Why Omitted Variables Don’t Matter for Prediction

Her objective

  • Predict \(\text{oil price}_{t+1}\) accurately
  • Minimise \((y_{t+1} - \hat{y}_{t+1})^2\)
  • Make profitable trades

What projection delivers

  • Best linear forecast given information
  • Optimal combination of available data
  • Correlation is sufficient

What she does NOT need

  • Causal effect of inventory on price
  • Understanding of market mechanisms
  • Unbiased structural parameters

What happens when we omit OPEC decisions?

  • May reduce forecast accuracy
  • But doesn’t create “bias”
  • Her \(\hat{\beta}_1\) captures predictive relationship

Omitted variable bias is a causal concept—it has no meaning in pure prediction

Identification, Estimation, and Inference

Why the Distinction Matters for Our Trader

Goal Requires Causation? Model Needed
Predict oil price tomorrow No Projection
Understand what drives prices Yes Regression

The trader’s situation

  • She wants accurate forecasts, not causal mechanisms
  • Omitting OPEC decisions may reduce accuracy, but doesn’t create “bias”
  • Projection coefficients capture predictive relationships—that’s enough

For forecasting, causal identification is irrelevant

Exercise 2: An NGO Targets Its Campaign

Reaching the Maximum Number of People

The NGO’s Problem

An NGO wants to maximise reach for a health campaign. They model:

\[\widehat{\text{reach}}_i = \hat{\beta}_0 + \hat{\beta}_1 \cdot \text{social media budget}_i + \hat{\beta}_2 \cdot \text{demographics}_i\]

They omit variables like local trust in institutions, existing health infrastructure, and cultural factors.

Their objective

Allocate budget to locations where predicted reach is highest.

Prediction vs Evaluation Require Different Models

Same Data, Different Questions

Identify high-reach locations (prediction)

  • Include all predictive variables
  • Omitted variables reduce accuracy
  • No causal interpretation needed
  • Projection model sufficient

\[\mathbb{E}[x \cdot e] = 0 \; \checkmark\]

Measure campaign effectiveness(causal)

  • Must control for confounders
  • Omitted variables create bias
  • Need causal interpretation
  • Regression model required

\[\mathbb{E}[e | x] = 0 \; \text{(required)}\]

The same organisation may need both models for different decisions

When Would OVB Matter for the NGO?

Shifting from Prediction to Causation

Prediction question (correlation sufficient)

“Where should we allocate next year’s budget for maximum reach?”

  • Historical data predicts well
  • No causal mechanism needed
  • Omitted variables reduce fit, not create bias

Causal question (exogeneity required)

“Does increasing social media budget cause higher reach?”

  • Need to isolate budget effect
  • Omitting trust/infrastructure biases estimate
  • Policy conclusions require causal model

Always ask: “Am I trying to predict or to understand causation?”

Exercise 4: Training and Productivity

We Want to Change the World, Not Just Describe It

The Policy Motivation

The firm’s question

“If we increase training, will productivity rise?”

What we estimate

\[\text{productivity}_i = \beta_0 + \beta_1 \cdot \text{training}_i + e_i\]

What we find

\[\hat{\beta}_1 = 0.15 \quad (p < 0.01)\]

“Each hour of training associated with 0.15 unit productivity increase”

The temptation

  • Implement training programme
  • Expect 15% productivity gains
  • Justify investment to board

The problem

  • Association \(\neq\) causation
  • What if estimate is biased?

Firm Size Confounds the Relationship

The Causal Structure

Two correlations contaminate our estimate

  • Larger firms invest more in training (economies of scale in HR)
  • Larger firms are more productive (market power, better technology)

Our Estimate Captures More Than Training Effects

Decomposing What \(\hat{\beta}_1\) Measures

\[\mathbb{E}[\hat{\beta}_1] = \underbrace{\beta_1}_{\text{true training effect}} + \underbrace{\gamma_2 \cdot \delta_1}_{\text{firm size effect}}\]

where:

  • \(\gamma_2 > 0\): firm size increases productivity
  • \(\delta_1 > 0\): training correlates with firm size

\[\text{Bias} = (+) \times (+) = (+)\]

Our estimate \(\hat{\beta}_1\) overstates the true training effect

Why Correlation Fails for Policy

Unless we change firm size, training alone won’t change productivity

What our estimate tells us

Firms with more training have higher productivity.

What it does NOT tell us

Giving more training will increase productivity.

The reality

  • The relationship is driven by firm size
  • Smaller firms that increase training won’t see gains
  • The causal path is: Size → Productivity and Size → Training
  • There is no direct: Training → Productivity (or it’s much smaller)

An Alternative Story Reveals the Problem

Same Data, Different Mechanism

Story A (what we assumed)

  • Training develops skills → Skills increase output → Productivity rises

Story B (what’s actually happening)

  • Large firms have resources → Resources fund training programmes
  • Large firms have market power → Market power increases profits/productivity

Both stories generate

\[\text{cov}(\text{training}, \text{productivity}) > 0\]

Without controlling for firm size, we cannot distinguish these stories

Exercise 5: Does Beauty Cause Higher Wages?

The Reverse Causality Problem

Two Plausible Causal Directions

  1. Beauty → Wages
  • Taste-based discrimination by employers
  • Customer preferences in service industries
  • Productivity benefits (confidence, social skills)
  1. Wages → Beauty
  • Higher earners afford better skincare, cosmetics
  • Access to aesthetic procedures
  • Better health investments

Cross-Sectional Data Cannot Separate These Effects

Why Exogeneity Fails

We estimate

\[\text{wage}_i = \beta_0 + \beta_1 \cdot \text{beauty}_i + e_i\]

But the error contains

\[e_i = \gamma_1 \cdot \text{wage}_i + \text{other factors}\]

Therefore

\[\mathbb{E}[e_i | \text{beauty}_i] = \mathbb{E}[\gamma_1 \cdot \text{wage}_i | \text{beauty}_i] \neq 0\]

Beauty correlates with the error because wages affect both

A Solution: Exploit the Arrow of Time

The Future Cannot Cause the Present

The identification strategy

Compare:

  • Today’s beauty rating
  • Tomorrow’s wage change

\[\Delta\text{wage}_{i,t+1} = \beta_0 + \beta_1 \cdot \text{beauty}_{it} + e_{it}\]

Why this works

  • Future wage changes cannot affect current beauty
  • Removes reverse causality channel

What we identify

\[\beta_1 = \frac{\text{cov}(\text{beauty}_t, \Delta\text{wage}_{t+1})}{\text{var}(\text{beauty}_t)}\]

  • Effect of beauty on future wage growth
  • Temporal ordering establishes direction
  • More credible causal interpretation

Using time as a natural ordering helps establish causality

Exercise 6: Does Health Cause Higher Earnings?

The Same Simultaneity Problem

Defining the Variables Precisely

\[\text{earnings}_i = \beta_0 + \beta_1 \cdot \mathbb{1}[i \text{ is healthy}] + e_i\]

Two-Way Causation Creates Identification Failure

The Causal Structure

  1. Health → Earnings
    • Healthier workers more productive
    • Fewer absences, more energy
    • Better cognitive function
  1. Earnings → Health
    • Higher earners afford better healthcare
    • Less financial stress improves health
    • Better nutrition and living conditions

The Solution Mirrors Exercise 5

Exploit Temporal Ordering

Strategy

\[\Delta\text{earnings}_{i,t+1} = \beta_0 + \beta_1 \cdot \mathbb{1}[i \text{ is healthy at } t] + e_{it}\]

Interpretation

  • Current health predicts future earnings changes
  • Future earnings changes cannot cause current health
  • Temporal precedence establishes causal direction

The arrow of time provides identification when simultaneity threatens

Exercise 7: i.i.d. Sampling vs Exogeneity

AS2 and AS5 Serve Different Purposes

Both can fail independently

AS2: Random Sampling

\[\text{cov}(y_i, y_j) = 0 \text{ for } i \neq j\]

  • About data collection
  • Ensures sample represents population
  • Affects inference (standard errors)
  • Violated by:
    • Clusters
    • Time series

AS5: Exogeneity

\[\mathbb{E}[e_i | x_i] = 0\]

  • About model specification
  • Ensures no confounding
  • Affects identification (bias)
  • Violated by:
    • Omitted variables
    • Reverse causality

Crucial Terminology: Parameters, Estimators, Estimates

Getting the Language Right

Parameter (Estimand)

\[\beta_1\]

  • Population quantity
  • Fixed, unknown value
  • Identified or not

Estimator

\[\hat{\beta}_1 = \frac{\widehat{\text{cov}}(x,y)}{\widehat{\text{var}}(x)}\]

  • Function of data
  • Random variable
  • Biased or unbiased

Estimate

\[\hat{\beta}_1 = 0.073\]

  • Computed number
  • Single realisation
  • Random variable

We say “biased estimator”—never “biased parameter” or “biased estimate”

Which Assumptions Enable What?

Assumption Statement Ident. Estim. Infer.
1 Linearity: \(y = \beta_0 + \beta_1 x + e\)
2 Random sampling
3 Variation in \(x\): \(\text{var}(x) > 0\)
4 Zero mean: \(\mathbb{E}[e] = 0\)
5 Exogeneity: \(\mathbb{E}[e \mid x] = 0\)
6 Homoskedasticity: \(\text{var}(e \mid x) = \sigma^2\)
7 Normality: \(e \sim N(0, \sigma^2)\)

AS1-AS5: Are we estimating something meaningful? AS6-AS7: Is our uncertainty correct?

Exercise 8: Earnings and School Desk Assignment

The Alumni Meeting Problem

\[\text{earnings}_i = \beta_0 + \beta_1 \cdot \text{desk number}_i + e_i\]

Lower desk numbers (front of class) → higher earnings

\[\begin{align*} H_{0}: \beta_1 = 0 \\ H_{1}: \beta_1 < 0 \end{align*}\]

What seems fine

  • Desk assignment was random
  • No omitted variable bias
  • No reverse causality

The sampling method

Survey conducted at alumni meeting

Who attends alumni meetings?

  • Successful graduates
  • Those with high earnings
  • Want to showcase success

Sample Selection Along the Dependent Variable

Non-Random Sampling Creates Bias

\(\mathbb{P}[i\text{ attended meeting} | \text{earnings}_i] \text{ is increasing in earnings}\)

Why this biases our estimate

If \(\beta_1 < 0\) (front seats → higher earnings):

  • Front-seat alumni: High earnings → likely to attend
  • Back-seat alumni: Lower earnings → less likely to attend

The consequence

  • We oversample successful back-seat students (the exceptions) and undersample unsuccessful front-seat students.
  • This creates positive bias: \(\mathbb{E}[\hat{\beta}_1] > \beta_1\)

Random treatment assignment doesn’t help when sample selection depends on the outcome

Who Shows Up to the Meeting?

Population relationship

  • Front seats: Mix of high and low earners
  • Back seats: Mix of high and low earners
  • True slope: \(\beta_1 < 0\)

What we’d estimate with random sample

\[\hat{\beta}_1 \approx \beta_1 < 0\]

Meeting attendees (selected sample)

  • Front seats: Mostly high earners (typical)
  • Back seats: Only high earners (atypical)
  • Selected slope: Flatter (closer to zero)

What we estimate

\[\hat{\beta}_1 > \beta_1\]

Bias is positive, toward zero

Random treatment assignment doesn’t help when sample selection depends on the outcome

Measurement Error: When Variables Contain Noise

Real Data Is Never Perfectly Measured

Two Distinct Sources of Error

Measurement error in \(y\)

\[y = y^* + e_0\]

  • Observed \(y\) differs from true \(y^*\)
  • Examples: self-reported income, recalled consumption

Measurement error in \(x\)

\[x = x^* + e_1\]

  • Observed \(x\) differs from true \(x^*\)
  • Examples: ability proxied by test scores, beauty rated by juries

These two types have fundamentally different consequences

Exercise 9: Beauty and Wages

Jury Ratings of Beauty Contain Error

The Empirical Setup

Data

  • Wages: Measured precisely (administrative records)
  • Beauty: Measured with error (jury ratings)

Estimates based on 3,000 workers

\[\widehat{\text{wages}}_i = \underset{(0.66)}{3.15} - \underset{(0.33)}{0.71} \cdot \mathbb{1}[\text{beauty}_{i} < \overline{\text{beauty}}]\]

The problem

  • Juries make mistakes in both directions
    • Some truly below-average rated as average
    • Some truly average rated as below-average

Can we conclude that below-average looking people earn less?

Measurement Error in \(x\) Causes Attenuation Bias

\[\text{plim}(\hat{\beta}_1) = \beta_1 \cdot \underbrace{\frac{\text{var}(x^*)}{\text{var}(x^*) + \text{var}(e_1)}}_{\lambda \in (0,1)}\]

Direction preserved, magnitude reduced

  • \(\lambda\) is always between 0 and 1
    • \(\hat{\beta}_1\) is shrunk toward zero
  • More noise \(e_1\) → smaller \(\lambda\) → more shrinkage

For our beauty coefficient

  • True \(\beta_{\text{below}} < 0\) (negative effect)
  • \(\hat{\beta}_{\text{below}} = -0.71\)
  • Since \(\lambda < 1\): \(|\hat{\beta}| < |\beta|\)
  • True effect is more negative than \(-0.71\)

Our estimate is an upper bound on the true (negative) effect

Testing the Hypothesis Despite Measurement Error

Test setup

  • \(H_0: \beta_{\text{below}} = 0\)
  • \(H_1: \beta_{\text{below}} \neq 0\)
  • Two-sided test at \(\alpha = 0.05\)

Results

  • \(\hat{\beta}_{\text{below}} = -0.71\)
  • \(\text{SE} = 0.33\)
  • \(t = \frac{-0.71 - 0}{0.33} = -2.15\)
  • \(|t| = 2.15 > 1.96\)

Decision

  • Reject \(H_0\) at \(\alpha = 0.05\)
  • Why we’re confident:
    • Without measurement error, \(|\hat{\beta}|\) would be larger
    • \(|t|\) would be larger
    • We would reject even more strongly

Attenuation makes our test conservative

The Intuition: Attenuation Gives Us a Bound

Our Estimate Understates the True Effect

True \(\beta_1\) is more negative than our estimate—draw on whiteboard

Conservative inference

  • \(\hat{\beta}_1 = -0.71\) is closer to zero than truth
  • If we reject \(H_0: \beta = 0\), true effect is at least this large

Exercise 10: Education and Wages

Measurement Error in Both Variables

The Setup

\[\widehat{\log(\text{wages})}_i = \underset{(1.3)}{3.2} + \underset{(0.2)}{0.4} \cdot \text{education}_i\]

\(n = 2{,}000\)

Sources of error

  • Wages: Self-reported (recall error)
  • Education: Misremembered years

Three parts

  1. Test \(H_0: \beta_{\text{educ}} = 0\) vs \(H_1: \beta_{\text{educ}} > 0\)
  2. Effect of less error in wages?
  3. Effect of less error in education?

Part (a): Testing Education Effect

One-Sided Hypothesis Test

Test setup

  • \(H_0: \beta_{\text{educ}} = 0\)
  • \(H_1: \beta_{\text{educ}} > 0\) (one-sided—theory predicts positive)
  • One-sided test at \(\alpha = 0.05\)
  • Critical value \(c = 1.645\)

Results

  • \(\hat{\beta}_{\text{educ}} = 0.4\)
  • \(\text{SE} = 0.2\)
  • \(t = \frac{0.4 - 0}{0.2} = 2.0\)
  • \(t = 2.0 > 1.645\)
  • Decision: Reject \(H_0\) at \(\alpha = 0.05\)
  • Interpretation: Significant evidence that education increases wages.

Part (b): Less Error in Wages

Measurement Error in \(y\) Affects Precision, Not Bias

When \(y = y^* + e_0\), we estimate:

\[y = \beta_0 + \beta_1 x^* + \underbrace{(e + e_0)}_{\upsilon}\]

Effect on variance

\[\text{var}(\upsilon) = \text{var}(e) + \text{var}(e_0)\]

  • Composite error has larger variance
  • \(\hat{\sigma}^2\) larger → SE larger → \(t\) smaller

If wages measured with less error

  • \(\text{var}(e_0) \downarrow\)
  • \(\text{var}(\upsilon) \downarrow\)
  • SE \(\downarrow\)
  • \(t \uparrow\)
  • Still reject, with lower \(p\)-value

Error in \(y\) does NOT bias \(\hat{\beta}\)—only inflates variance

Part (c): Less Error in Education

Measurement Error in \(x\) Affects Both Bias and Variance

\[\text{plim}(\hat{\beta}_1) = \beta_1 \cdot \frac{\text{var}(x^*)}{\text{var}(x^*) + \text{var}(e_1)}\]

If education measured with less error

  • \(\text{var}(e_1) \downarrow\)
  • Attenuation factor \(\lambda \uparrow\) (closer to 1)
  • \(\hat{\beta}_1 \uparrow\) (closer to true \(\beta_1\))

Effect on test

  • Since \(\beta_1 > 0\): estimate increases
  • \(t = \hat{\beta}_1 / \text{SE}\) increases
  • Still reject, with lower \(p\)-value

Both types of reduced error strengthen our conclusion

Exercise 11: Error in \(y\) vs Error in \(x\)

A Fundamental Distinction

“Measurement error in the independent variable is a serious problem. Measurement error in the dependent variable is not.” ✅

Error in \(y\) Error in \(x\)
What happens \(\text{var}(\upsilon) \uparrow\) Attenuation bias
Bias? No Yes (toward zero)
Efficiency? Reduced
Assumption violated None AS5 (exogeneity)

Why Error in \(y\) Doesn’t Cause Bias

Exogeneity preserved → OLS unbiased Full derivation

Setup

  • True: \(y^* = \beta_0 + \beta_1 x^* + e\)
  • Observed: \(y = y^* + e_0\) where \(\text{cov}(e_0, x^*) = 0\)

Substitution

\[y = \beta_0 + \beta_1 x^* + \underbrace{(e + e_0)}_{\upsilon}\]

Check exogeneity

\[\text{cov}(x^*, \upsilon) = \text{cov}(x^*, e) + \text{cov}(x^*, e_0) = 0 + 0 = 0 \; \checkmark\]

Why Error in \(x\) Causes Bias

Exogeneity violated → OLS biased Full derivation

Setup

  • True: \(y^* = \beta_0 + \beta_1 x^* + e\)
  • Observed: \(x = x^* + e_1\), so \(x^* = x - e_1\)

Substitution

\[\begin{align*} y^{*} &= \beta_0 + \beta_1(x - e_1) + e \\ &= \beta_0 + \beta_1 x + \underbrace{(e - \beta_1 e_1)}_{\upsilon} \end{align*}\]

Check exogeneity

\[\begin{align*} \text{cov}(x, \upsilon) &= \text{cov}(x^* + e_1, e - \beta_1 e_1) \\ &= -\beta_1 \text{var}(e_1) \neq 0 \; \text{✗} \end{align*}\]

Deriving the Attenuation Factor

From Violated Exogeneity to Bias Formula Full derivation

\[\text{plim}(\hat{\beta}_1) = \frac{\text{cov}(x, y)}{\text{var}(x)} = \frac{\beta_1 \text{var}(x^*)}{\text{var}(x^*) + \text{var}(e_1)}\]

The attenuation factor

\[\lambda = \frac{\text{var}(x^*)}{\text{var}(x^*) + \text{var}(e_1)}\]

  • Signal-to-total-variance ratio
  • More noise → smaller \(\lambda\)

Properties

  • As \(\text{var}(e_1) \to 0\): \(\lambda \to 1\)
  • As \(\text{var}(e_1) \to \infty\): \(\lambda \to 0\)
  • Bias is predictable in direction
  • Estimate always toward zero

Exercise 12: Why Attenuation Isn’t Catastrophic

The Silver Lining of Measurement Error in \(x\)

Why This Bias Is Manageable

What we know for certain

\[|\text{plim}(\hat{\beta}_1)| < |\beta_1|\]

  • Estimate closer to zero than truth
  • Sign of true effect preserved
  • Magnitude underestimated

What this means for testing

If we reject \(H_0: \beta_1 = 0\):

  • True effect definitely exists
  • True effect at least as large as estimate
  • We have a lower bound on \(|\beta|\)

Attenuation provides interpretable bounds—rejection is strong evidence

Summary: Measurement Error

Two Problems, Two Different Severities

Aspect Error in \(y\) Error in \(x\)
Primary effect ↑ variance Bias toward zero
Estimator property Unbiased, inefficient Biased, but bounded
Can trust \(\hat{\beta}\)? Yes Direction yes, magnitude no
Can trust rejection? Yes Yes (conservative)

Error in \(x\) more serious, but attenuation gives us something useful

Heteroskedasticity: Non-Constant Error Variance

A Different Kind of Problem

Not About Bias—About Inference

Homoskedasticity (AS6)

\[\text{var}(e_i | x_i) = \sigma^2 \quad \forall i\]

  • Same variance everywhere
  • One parameter describes all errors

Heteroskedasticity

\[\text{var}(e_i | x_i) = \sigma_i^2\]

  • Variance depends on \(x\)
  • Different spread at different values

This doesn’t bias OLS—but it breaks our standard errors

Exercise 13: Firm Profitability and Sales

The Model and Interpretation

Setup

\[\text{profits}_i = \beta_0 + \beta_1 \cdot \log(\text{sales}_i) + e_i\]

where profits measured in millions of dollars.

Interpretation of \(\hat{\beta}_1\)

  • Log-level specification
  • 1% ↑ in sales → \(\hat{\beta}_1 / 100\) million more profit
  • Or: 1% ↑ sales → \(\hat{\beta}_1 \times 10{,}000\) dollars more profit

The heteroskedasticity concern

Larger firms likely have more variable profits:

\[\text{var}(e_i | \text{sales}_i) = \sigma_i^2\]

increasing in sales

OLS Remains Unbiased Under Heteroskedasticity

Coefficients are right—but we don’t know how precise they are

Unbiasedness requires AS1-AS5 only

  1. ✓ Linearity
  2. ✓ Random sampling
  3. ✓ Variation in \(x\)
  4. \(\mathbb{E}[e] = 0\)
  5. \(\mathbb{E}[e|x] = 0\)

None involve error variance

\(\therefore\) OLS still unbiased

What requires AS6

  • Variance formula for \(\hat{\beta}\)
  • Standard errors
  • \(t\)-statistics
  • Confidence intervals
  • \(p\)-values

All inference machinery

Exercise 14: What Goes Wrong and The Solution

The Standard Variance Formula Assumes Constant Variance

Under Homoskedasticity

\[\text{var}(\hat{\beta}_1) = \frac{\sigma^2}{\sum_{i=1}^n (x_i - \bar{x})^2}\]

This formula requires

  • One \(\sigma^2\) for all observations
  • We estimate \(\hat{\sigma}^2 = \frac{1}{n-k-1}\sum \hat{\varepsilon}_i^2\)
  • Plug in and compute SE

Under heteroskedasticity

  • No single \(\sigma^2\) exists
  • Using wrong formula
  • SE incorrect
  • All inference invalid

The True Variance Under Heteroskedasticity

A More General Formula

\[\text{var}(\hat{\beta}_1) = \frac{\sum_{i=1}^n (x_i - \bar{x})^2 \sigma_i^2}{\left[\sum_{i=1}^n (x_i - \bar{x})^2\right]^2}\]

What this formula says

  • Each observation weighted by \((x_i - \bar{x})^2\)
  • Each observation has own \(\sigma_i^2\)
  • Cannot simplify to standard formula

The problem

  • We don’t know individual \(\sigma_i^2\)
  • Standard software computes wrong thing
  • Need different approach

What Happens If We Use Wrong Formula?

Direction of Error Depends on Pattern

If variance increases with \(|x - \bar{x}|\):

  • Standard SE too small
  • \(t\)-statistics too large
  • \(p\)-values too small
  • Reject \(H_0\) too often → overconfident

If variance decreases with \(|x - \bar{x}|\):

  • Standard SE too large
  • \(t\)-statistics too small
  • \(p\)-values too large
  • Fail to reject too often → underpowered

Using wrong SE can go either direction—we can’t know without checking

Robust Standard Errors: The Solution

Heteroskedasticity-Consistent Variance Estimator

\[\widehat{\text{var}}_{\text{robust}}(\hat{\beta}_1) = \frac{\sum_{i=1}^n (x_i - \bar{x})^2 \hat{\varepsilon}_i^2}{\left[\sum_{i=1}^n (x_i - \bar{x})^2\right]^2}\]

How it works

  • Uses \(\hat{\varepsilon}_i^2\) to estimate \(\sigma_i^2\)
  • Each residual estimates its own variance
  • No constant variance assumption needed

Properties

  • Consistent under heteroskedasticity
  • Valid under homoskedasticity too
  • Always safe to use
  • Standard in applied work

Summary

Six Lessons About LR Limitations

1. Projection requires \(\mathbb{E}[xe] = 0\), while regression requires \(\mathbb{E}[e|x] = 0\)

  • OVB irrelevant for forecasting

2. Correlation of \(z\) with both \(x\) and \(y\) creates confounding

  • Policy based on correlation will fail

3. Reverse causality violates AS5

  • Simultaneity:
    • beauty ↔︎ wages
    • health ↔︎ earnings
  • One solution: exploit temporal ordering

4. Non-random sampling violates AS2

  • Selection on \(y\) distorts relationship

5. Terminology matters

  • Parameters: identified or not
  • Estimators: biased or unbiased
  • Estimates: neither

6. Random sampling affects inference, while exogeneity affects identification

  • Both can fail independently

Four Key Lessons

What We Learned Today

Measurement Error

  1. Error in \(y\): Increases variance, no bias
    • OLS unbiased but inefficient
    • Valid inference with larger SE
  2. Error in \(x\): Causes attenuation bias
    • Estimate shrunk toward zero
    • But gives interpretable bounds
    • Rejection is conservative

Heteroskedasticity

  1. Effect: Doesn’t bias OLS
    • Only affects inference
    • Standard SE formula wrong
  2. Solution: Robust standard errors
    • Valid whether or not heteroskedasticity present
    • Always use in cross-sections

Next Class: Exercises 9-14

Measurement Error, Heteroskedasticity, and Applications

What we’ll cover

  • Measurement error: Bias when \(x\) measured with noise
  • Heteroskedasticity: Non-constant error variance

Why it matters

  • Real data always contains measurement error
  • Heteroskedasticity ubiquitous in cross-sections
  • Robust standard errors as practical solution
  • Distinguishing bias problems from inference problems

Understanding which assumption fails guides the solution

Appendix: Mathematical Derivations

Error in \(y\): Full Derivation

Proving OLS Remains Unbiased

Setup

  • True model: \(y^* = \beta_0 + \beta_1 x^* + e\) with \(\mathbb{E}[e|x^*] = 0\)
  • Observed: \(y = y^* + e_0\)
  • Assume: \(\text{cov}(e_0, x^*) = 0\) and \(\text{cov}(e_0, e) = 0\)

Substitution

\[\begin{align*} y &= y^* + e_0 \\ &= \beta_0 + \beta_1 x^* + e + e_0 \\ &= \beta_0 + \beta_1 x^* + \upsilon \end{align*}\]

where \(\upsilon = e + e_0\)

Error in \(y\): Full Derivation

Proving OLS Remains Unbiased

Checking exogeneity

\[\begin{align*} \text{cov}(x^*, \upsilon) &= \text{cov}(x^*, e + e_0) \\ &= \text{cov}(x^*, e) + \text{cov}(x^*, e_0) \\ &= 0 + 0 = 0 \; \checkmark \end{align*}\]

Exogeneity preserved → OLS unbiased \(\blacksquare\)

Return to main

Error in \(x\): Full Derivation

Proving OLS Becomes Biased

Setup

  • True model: \(y^* = \beta_0 + \beta_1 x^* + e\) with \(\mathbb{E}[e|x^*] = 0\)
  • Observed: \(x = x^* + e_1\), so \(x^* = x - e_1\)
  • Assume: \(\text{cov}(e_1, x^*) = 0\) and \(\text{cov}(e_1, e) = 0\)

Substitution

\[\begin{align*} y^* &= \beta_0 + \beta_1(x - e_1) + e \\ &= \beta_0 + \beta_1 x - \beta_1 e_1 + e \\ &= \beta_0 + \beta_1 x + \upsilon \end{align*}\]

where \(\upsilon = e - \beta_1 e_1\)

Error in \(x\): Full Derivation (continued)

Checking Exogeneity

\[\begin{align*} \text{cov}(x, \upsilon) &= \text{cov}(x^* + e_1, e - \beta_1 e_1) \\ &= \text{cov}(x^*, e) - \beta_1\text{cov}(x^*, e_1) + \text{cov}(e_1, e) - \beta_1\text{cov}(e_1, e_1) \\ &= 0 - \beta_1 \cdot 0 + 0 - \beta_1 \text{var}(e_1) \\ &= -\beta_1 \text{var}(e_1) \end{align*}\]

Since \(\text{var}(e_1) > 0\) and typically \(\beta_1 \neq 0\):

\[\text{cov}(x, \upsilon) \neq 0 \; \text{✗}\]

Exogeneity violated → OLS biased \(\blacksquare\)

Return to main

Attenuation Factor: Full Derivation

Deriving \(\text{plim}(\hat{\beta}_1)\)

OLS estimator converges to

\[\text{plim}(\hat{\beta}_1) = \frac{\text{cov}(x, y)}{\text{var}(x)}\]

Compute numerator

\[\begin{align*} \text{cov}(x, y) &= \text{cov}(x^* + e_1, \beta_0 + \beta_1 x^* + e) \\ &= \beta_1 \text{cov}(x^* + e_1, x^*) \\ &= \beta_1 [\text{var}(x^*) + \text{cov}(e_1, x^*)] \\ &= \beta_1 \text{var}(x^*) \end{align*}\]

Compute denominator

\[\text{var}(x) = \text{var}(x^* + e_1) = \text{var}(x^*) + \text{var}(e_1)\]

Return to main